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Abstract 

Permutation entropy quantifies the diversity of possible orderings of the values a 
random or deterministic system can take, as Shannon entropy quantifies the diver- 
sity of values. We show that the metric and permutation entropy rates — measures 
of new disorder per new observed value — are equal for ergodic finite-alphabet in- 
formation sources (discrete-time stationary stochastic processes). With this result, 
we then prove that the same holds for deterministic dynamical systems defined by 
ergodic maps on n-dimensional intervals. This result generalizes a previous one for 
piecewise monotone interval maps on the real line (Bandt, Keller and Pompe, "En- 
tropy of interval maps via permutations" , Nonlinearity 15, 1595-602, (2002)), at the 
expense of requiring ergodicity and using a definition of permutation entropy rate 
differing in the order of two limits. The case of non-ergodic finite-alphabet sources 
is also studied and an inequality developed. Finally, the equality of permutation and 
metric entropy rates is extended to ergodic non-discrete information sources when 
entropy is replaced by differential entropy in the usual way. 



1 Introduction 



The entropy rate is a key parameter associated with stochastic processes, in- 
formation sources and dynamical systems. Roughly speaking, the entropy rate 
quantifies the average uncertainty, disorder or irregularity generated by a pro- 
cess or system per 'time' unit and, it is the primary subject of fundamental 
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results in information and coding theory (Shannon's noiseless coding theorem) 
and statistical mechanics (second law of thermodynamics). It is not surpris- 
ing, therefore, that this notion, appropriately generalized and transformed, is 
ubiquitous in many fields of mathematics and science when randomness or 
'random-like' behavior is at the heart of the theory or model being studied. 

For definiteness consider a stationary information source emitting a time- 
series of observed values continuous state space — formally, 
draws from the random variables X±, . . . ,X n . Since the realization of a non- 
discrete random variable cannot be observed exactly (this would mean an 
infinite amount of information), the observer has to content himself with a 
finite degree of accuracy. Generally speaking, the metric or Shannon entropy 
rate of an information source is the rate of new information it generates per 
unit time (as the metric or Kolmogorov-Sinai entropy rate of a deterministic 
dynamical system is a measure of its pseudo-randomness or chaotic behavior). 
Given a certain discretization scale A of the state space, the metric (Shannon) 
entropy rate h m of the discretized information source X A = (X A ) neN is 



with X A f = X A . . . X A a length L word of symbols X A discretized at reso- 
lution A from Xf — X\ . . . X^. We use H m (Z) for the entropy of the discrete 
random variable Z, i.e., H m (Z) = H m (Pr(Z)) = — J2 Z P?{z) log2 Pv(z) for the 
probability distribution Pr(z) of Z. We come back to the metric entropy and 
entropy rate in the next section, where we set the conceptual background of 
this paper on a more formal footing. 

Consider a length L word of observables X A f. Assuming there exists a natural 
order relation on the state space of the source X A (e.g., real scalars or vectors 
with a defined lexicographic ordering), each block of observations X A f selects 
one particular permutation II out of the L\ possible permutations. For example, 
if X A < X A < X A , then the corresponding permutation can be expressed 
explicitly as H(X A f) = (2, 1, 3). Note that the mapping from X A -orderings to 
permutations can be many-to-one when there are repeated values; to overcome 
this shortcoming, we will use 'ranks' below (see Sect. 3), so that words defining 
the same permutation have the same rank variables which, in turn, can be 
identified with the corresponding permutation. Bandt and Pompe [3] defined 
the permutation entropy of order L as 1 



1 The factor 1/(L — 1) is used instead 1/L because II(X A {) = 1 contributes nothing 
to the entropy. This choice is, of course, inconsequential when L — ► oo, but it is 
preferable for numerical simulations and the applications we discuss in the last 
section. 
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with Pr II(X A f ) being the probability of observing any particular permuta- 
tion given a block of observables. In direct analogy to the Shannon entropy 
rate, the permutation entropy rate at resolution A is hence defined as (follow- 
ing the notation of [2]) 

/4(X A ) := lim H* m (X^). (1) 

For deterministic maps / of a proper interval / C K with a finite number of 
monotony segments, Bandt, Keller and Pompe [2,3] analytically and numeri- 
cally investigated a permutation entropy rate we denote by h^ KP (f), based on 
the entropy of certain partitions, proving that it exists and, in fact, equals the 
metric (Kolmogorov-Sinai) entropy rate h m (f). They also prove this equality 
for the topological versions of permutation and ordinary entropy rates. Rela- 
tive changes in h^ KP estimated numerically from time-series from the logistic 
map tended to track very well, over a wide range of varying nonlinearity pa- 
rameter, the behavior of h m (estimated from the positive Lyapunov exponent 
of the map directly). There remained a substantial bias, though it was nearly 
constant over parameters. 

The correspondence observed in [3] between permutation entropy and met- 
ric entropy rates of time series is not coincidental, nor restricted to one- 
dimensional dynamics. Under only the assumption of ergodicity, we show that 
the permutation entropy rate of stationary, finite-alphabet random processes 
equals the metric entropy rate. A similar result follows for the permutation and 
metric differential entropy rates of non-discrete sources. With these results on 
stochastic processes in the hand, we further show that for ergodic maps on 
rf-dimensional intervals I d the two entropy rates are also equal. In doing so, we 
define the permutation entropy rate as h^f) = limA^o ^m(X A ), where X A 
stands now for the 'simple observations' of / supplied by a discretization of 
I d with resolution A — a finite-state stochastic process. The generality of all 
these results gives a strong support to our approach, which provides a unified 
treatment for stochastic and deterministic dynamical systems. 

This paper is organized as follows. For the reader's convenience we review in 
Sect. II the theoretical background and fix the notation. Sect. Ill contains one 
of the main results of this paper, namely, h m = h* m for ergodic finite-alphabet 
stochastic processes (Theorem 1). This result is generalized in Sect. IV to 
non-discrete ergodic information sources using the differential entropy rate 
(Theorem 2) and, in Sect. V, to maps on d- dimensional intervals (Theorem 
3). We also mention in Sect. Ill that h* m > h m for non-ergodic finite-alphabet 
sources; the proof can be found in Appendix B. Sect. V contains the main result 
on finite-dimensional maps, and Sect. VI, a discussion of the two definitions of 
permutation entropy. Finally, in Sect. VII we show some numerical examples 
and discuss open practical issues in using permutation entropies in time-series 
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analysis. 



2 Theoretical framework 

2. 1 Stochastic processes and dynamical systems 

Let W m = {x = (x n ) n&} : x n £ W 1 }, B the product sigma-algebra of M dN 
generated by the Borel sets of M. d , and a the (left) shift transformation on R m , 
(crx) n = x n+1 . Let (Q, J 7 , /i) be a probability space, i.e., fl is a nonempty set, 
T is a sigma-algebra of subsets of Q and /i is a (positive) measure on (f2,jF). 
Any stationary stochastic (or random) process in discrete time X = (X n ) nG ^ 
on the probability space (Q, J 7 , /x) with values in M. d corresponds in a standard 
way to the shift dynamical system (M dN , B, m, a) via the map : f2 — > M. m 
defined by (4>uj) n = X n (cu), n e N. The probability measure m is defined on 
the Borel sets 5 of R m by 

m(B) :=Ai(0 _1 S) 

(0 _1 _B G because X n is jF-measurable for all n) and it is a-invariant (i.e., 
m o o"" 1 = m) because of the stationarity of X. The measure m is sometimes 
called the induced probability measure or distribution on the space of possible 
outputs of the random process. Moreover, if ii n : M. m — > M. d is the projection 
onto the nth component, ir n x = x n = X n (u)) (or 7r n = X n o 0" 1 ), then the 
'sampling function' 7r = {ji n ) has the same joint distributions on M, m as X = 
(X n ) on Q, i.e., both processes are equivalent. Any point x of the state space 
M. m is a possible realization (or 'sample path') of the whole process. Such one- 
sided random processes provide better models than the two-sided processes 
(X n ) ne z for physical information sources that must be turned on at some time 
and thus we will use both denominations interchangeably in this paper. 

We will also refer to the shift dynamical system {R m , B, m, a) as the (sequence 
space) model of the stochastic process or information source X. Sometimes 
Z + = {0,1,...} is used instead of N to number the random variables X n 
and their samples x n (we do so in Sect. 4). Models allow to focus on the 
random process itself as given by the probability distribution on their outputs, 
dispensing with a perhaps complicated underlying probability space. As usual, 
we will also identify X n with n n = n o a n . 

Finite-state or finite-alphabet sources S = (S n ) ne ^ on (fi,jF, u), where S n : 
Q — > A with alphabet A = {a\, . . . , aui}, are dealt with in a similar way 
to the previous, non-discrete sources and, as a matter of fact, most of the 
general setup, properties and observations above apply mutatis mutandis to 
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this simpler case. The sequence space of the corresponding model is now 
A N = {s = (s n ) neN : s n G A}, A being endowed with the discrete topol- 
ogy; let Z be the product sigma-algebra of A n generated by the elements of 
A. Since no confusion will arise, we continue denoting by a the shift on A N , 
(<js) n = s n+ i, and by m the a-invariant measure on (A N ,Z) defined as the 
pushforward of p by the map <fi : Q — > A N , (<fxv) n = S n (u). The finite order 
probability distribution of S, Pr(S , il = s h , ...,<%„ = s in ) =: Pr(s il , . . . , s in ), 
can be alternatively expressed by means of the probability distribution on the 
outputs of S, 

P r ( s ii 5 • • • 5 s i n ) 

= p{uj G n : S h (uj) = s h ,. . . ,S in (u) = s in } 

= m{teA":Z il = s il ,...,Z in = s in } (2) 
for any i lt . . . ,i n eN and s h ,...,s in G A. 

In this paper we will consider mostly finite-alphabet sources, although these 
will also occasionally arise as discretizations or quantizations X A of sources 
X taking values on a proper interval I d of M. d (I d M. d in symbols) endowed 
with Lebesgue measure A. Formally, this means that there exists a (usually, 
uniform) partition 5 = {A x , A^} of I d into a finite number of A-measurable 
subsets such that is the discrete random variable defined by 

Pr (X„ A = i) =p {u G A : X*{u) G A,} 

= m ^eA AN :^ = i } = J A dF(x), 

where F(x) = Pr(X A < x) = n{u> G SI : X^(uj) < x} is the common distribu- 
tion function to all A A (in case X A is a vector random variable, the inequality 
is understood component- wise), m is the induced probability measure on the 
outputs and A A = {1,...,\5\} is the alphabet of X A . If X A has a density 
function p(x) (formally, the Radon-Nykodim derivative of F with respect to 
A), then Pr (X A — zj — J A . p(x)dx. Distribution functions and densities of 
higher finite order are analogously defined. For A, the 'discretization scale' or 
'resolution' we referred to in the Introduction, one can take any measure of 
the 'coarseness' of 5, say, the largest diameter of its elements, also called the 
norm of 5, \\5\\. 

2.2 Entropy rate of dynamical systems and stochastic processes 

Let (Q, J 7 , p) be a probability space and / : Vt — > ft a yU-preserving trans- 
formation, i.e., p(f^A) = p(A) for all A G T . Given the dynamical system 
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(Q, J 7 , /j,, f) and a finite partition a = {A±, A\ a \} C T of Q, the entropy of 
/ with respect to a is defined as 

K{f,a):= lim jH, f\/ ( 3 ) 

where vfj^f^a = {n^ 1 /"^-J is the least common refinement of the par- 
titions ja, / _1 a:, /~ L+1 aj and H^fi) := —J2fli^(Bj)\ogjj,(Bj) for any 
finite partition /3 = {Si, -Bi/ji} C JF. The metric or Kolmogorov-Sinai en- 
tropy rate of map / is then defined as: 

h^f) := sup hf,(f,a). (4) 

a 

The convergence in (3) can be proved to be monotonically decreasing [6]. 
Assuming logarithms base 2 everywhere herein, h^f) has units of bits per 
symbol or time unit, if n is interpreted as discrete time. By convention, • 
logO := lim :I ._ > o+ x l°g x — 0. In an information-theoretical setting, h^(f,a) 
represents the long-term average of the information gained per unit time with 
respect to a certain partition and h^(f) the maximum information per unit 
time available from any stationary process generated by the source, typically 
equal to the sum of the positive Lyapunov exponents by the Pesin theorem. If 
there exists finite 7 such that h^(f, 7) = h^(f), then 7 is called a generator, 
or generating partition, of /. 

Given a discrete alphabet source S = (S n ) with model (A N ,Z,m,a), the 
(Shannon) entropy of the random variables := S± ... Sl is 

H m {S^) :=H m (\f 'a-^y 

where ( = {Ci, C\a\} is the partition of A N consisting of the basic 'cylinder 
sets' Q = {s G A N : s 1 = a^}, 1 < % < \A\. According to (2), 

H m (Si) = ~ E Pr ( s i> • • • > s l) logPr(si, ...,s L ), 

and, correspondingly, the entropy rate (or uncertainty) of the source is defined 
as h m (S) := h m (<T,() — h m (cr) since ( is a (one-sided) generator of Z, i.e., 

MS) = lim jH m {S^). 

In other words, the Shannon entropy rate of S is, by definition, the Kolmogorov- 
Sinai entropy rate of its sequence space model. This explains our using 'metric' 
to refer to both concepts, independently of the random or deterministic nature 
of the system considered. Sometimes we will also use the nth order entropy of 
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H m (Si) := —H m (Si), 
so that h m (S) = limi^oo H m (Si). In general, Sj stands for the string Si . . . Sj. 

Other dynamical, statistical or information-theoretical concepts like condi- 
tional entropy, mutual information, ergodicity, mixing properties, etc., are also 
defined via the sequence space model. For example, S is said to be ergodic if 
(A N , Z, to, a) is ergodic, i.e., for Ci, C 2 G Z with to(Ci) > 0, to(C 2 ) > 0, there 
exists n > such that to(Ci n o"~ n C 2 ) > 0. 

If, more generally, X is a non-discrete scalar or vector source with outcomes 
on an interval I d $1 R d , define its differential entropy rate as 

MX) := (/i m (X A ) + log A) , (5) 

where X A is a uniform discretization of X with resolution scale A. The dif- 
ferential entropy shows how the average rate of information furnished by a 
quantization of resolution A differs from |logA| when A — > 0. If X A f hap- 
pens to have a density function p(xi, ...,xl) for every L > 1, then 

MX) = / p(xi,...,x L )\ogp(xi,...,x L )d L x. 
Ji d 



3 Permutations and the metric entropy rate of finite-alphabet sources 

Given a finite-alphabet source S = (S n ) with model (A N ,Z,m, a), each pos- 
sible permutation of a block of length L, e.g., S^ := S± . . . Sl, can be indexed 
as a word of ranks, each an integer in successively larger alphabets. In par- 
ticular, define for n > 1 the rank variable R n = \{S i: 1 < % < n : Si < S n }\ = 
X^=i<5(<Si < Sn), where, as usual, the 5-function of a proposition is 1 if it 
holds and otherwise. By definition, R n is a discrete random variable on Q 
with range {1, . . . , n} and the sequence R = (Rn) builds a discrete-time non- 
stationary process. Then the permutation H(Sf ) in (1) can also be viewed as 
the word R[ = R\ . . . R L , the relation between both being one-to-one. The 
many-to-one relation between S'f and Rf is written as R[ = (p(S^). 

For example, consider a source S over the alphabet {1,2,3}. Suppose we ob- 
serve the word Sf = 1,3,3. Then, Rf = <p(Sf) = 1,2,3, (of course other 
strings, e.g., 1,1,1 or 2,2,2, also map to Rf = 1,2,3) and U(Sf) = (1,2,3). 
The string 1, 3, 3 could be counted as matching both the ordering Si < S 2 < S3 
and Si < S3 < S 2 - By using ranks, by contrast, the measure associated with 
each word is unambiguously associated with one permutation, and the rest of 
our development follows this approach. 
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The permutation entropy rate of S is then defined as 

/4(S) := lim H m (R[) , 

alternatively to the definition (1), with 

= —[~[ S Pr ( ri ' tl ) lo g pr ( r b r i) 

defined to be the permutation entropy of order L > 2 of S. Remember that 
the overbar notation H means that the relevant factor of 1/L or \j{L — 1) has 
been included for the entropy of a block of length L. 

Let <tl denote the set of permutations of {1, L} for the time being. We say 
that the word S± is of type n G ctl if -Rf = <p(Si) defines the permutation it. 
It follows s 7r ( 1 ) < . . . < s 7r (L). The cylinder sets 

C n := {s E A N : Si is of type %} 

such that C n ^ build a partition of A N with m(C 7r ) = Pr(i?f = rf), 
I < r k < k for k = 1, . . . , L. Therefore 

H* m (5f ) = E logm(a). (6) 

That is, the permutation entropy is sensitive to the measures of non-trivial 
order relationships observed in a word, as the Shannon entropy is sensitive to 
the measures of the different word values themselves. 

Observe as a technical point for later reference that, if 

Q w := {s E A N : s ff( i) < s n{2 ) < ••• < Stt(l)}, 

then ^ Q w due to words sf with repeated letters: if Si ^ Sj for every 
1 < i,j < L, then s e if and only if s e Qtt- 

Lemma 1 Given an ergodic information source S, 




for all I > 1 . 

That is, given a sufficiently long tail of previously observed symbols, the later 
ranks can be predicted virtually as well as the symbols themselves. Heuristi- 
cally, this is because the distribution of rank variable Rk+i for k sufficiently 
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large depends effectively on only the cumulative distribution function of the 
source, approximated by the normalized sum of . In turn this means that 
the information contained in Rk+i is the same as the information in S^+i- The 
proof, and an elementary example, is given in Appendix A. With Lemma 1 in 
hand, we turn to our first main result, the equality between permutation and 
metric entropy for finite-alphabet stochastic processes. 

Theorem 2 For finite- alphabet ergodic sources S the permutation entropy 
rate exists and equals the metric entropy rate: h^S) = h m (S). 



PROOF. We prove inequalities in both directions. 

(a) lim sup^^ H^(Si) < h m (S). Given S'f , the corresponding rank variables 
are uniquely determined via Rf = <p(S^). By [4] (Ch 2, exercise 5), H((p(Z)) < 
H{Z) for any discrete random variable Z, so H m (R^) < H m {S^) and thus 
limsup i ^ 00 iJ m ( J Rf') < limsup L ^ 0O # m (S'f) = h m (S). 

(b) lim inf^oo H^S^ ) > h m (S). There are several ways to prove this in- 
equality. Consider, for instance, 



lim inf H* m {S^) = lim mi-H m (R^ 

L—nx> L^oo lj 



= lim inf — 



Rm,{Rh\R\ ) + ■■■ + H m (R L * +1 \R 1 ) + H m (R 1 



for any L* < L, where we have applied the chain rule for entropy. As R\ = 
(p(S^) we apply the data processing inequality H(Y\ip(Z)) > H(Y\Z) [4] to 
all elements of the first term on the rhs: 



lim inf H* m (S^) > lim inf - \H m {R L \S^ 1 ) + . . . + H m (R L . +1 \S?) + H m {B?) 

L—>oo L^oo lj l 



By Lemma 1, for any e > there is some L* such that H m (S L \Si v ) — H m (R L \Si v ) 
e for L > L*, so 



< 



lim inf H* m {S^) > lim inf - H m (S L \St l ) + ■ 

L—too L^oo \ Li L 



1 r 
+ L 



H m {R\ ) — H m (Si ) 
= hm(S) - e. 



■ + H m (S 2 \Si) + H m (Si) 



The existence of the limit and equality follows from (a) and (b). □ 
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More generally, we can only show an inequality for non-ergodic cases, namely, 



The proof of (7) uses the ergodic decomposition of the entropy rate and is 
given in Appendix B. 



4 Non-discrete information sources 

Information sources can have also non-discrete alphabets, although their out- 
comes are only observable with a finite precision. In this case, it is well-known 
that Shannon's entropy rate, defined as the limit over ever finer uniform quan- 
tizations of the source, diverges logarithmically with the quantization scale. 
In order to obtain a finite measure of the asymptotic behavior of such quanti- 
zations, one has to resort to the differential entropy rate (5) instead. It turns 
out that Theorem 2 can be extended to scalar and vector ergodic non-discrete 
sources if entropy is replaced by differential entropy. 

Let X = (X n ) be a scalar or vector ergodic source taking values on an interval 
I d £ R d , d > 1. In case d > 1 (vector sources), I d is supposed to be endowed 
with the product (or lexicographical) order: x < x' if x k = x' k for k = d,d — 
l,...,d— s > 1 and x^s-i < x' d _ s _ 1 (other conventions are also possible). 
With the equality between permutation and metric entropy rates for ergodic 
finite-alphabet sources, we now consider the source X uniformly discretized 
to an alphabet A A = {1, . . . , iV} by means of a partition 5 = {Ai, Ajy} 
of I d with A(Aj) = X(I d )/N =: A for 1 < % < N, where A is, as before, 
Lebesgue measure. One can then define the ranks R A : Q — > {l,...,n} of 
blocks of discretized symbols X A f in the known way: R A = Yh=i &{X a < 
X A ), 1 < n < L. If (A AN ,Z A ,m A ,a) is the sequence space model for X A , 
we define the permutation entropy rate at resolution A as usual: /i^ A (X A ) : = 
lim^oo H m A(R A i ). We can take now the limit A — > and, analogously to 
(5), define the differential permutation entropy rate of X as, 



This yields: 

Theorem 3 Suppose X is an ergodic non-discrete source. Then /ij^(X) = 
h m (K), that is, the differential permutation and metric entropy rates o/X are 
equal. 



lim inf^(^) > MS). 



(7) 
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PROOF. If (R m , B, m, a) is ergodic, so is (A M \ Z A , m A , a). By Theorem 2, 
^ A (X A ) = /i m A(X A ),so 

h* m (X) = hm (/i m A(X A ) + log A) = MX), 

where h m (X.) is the metric differential entropy rate of X. □ 

5 Permutations and the metric entropy of ergodic maps 

In this section we will use our result on finite-alphabet stochastic processes to 
show that the equality between permutation and Kolmogorov-Sinai entropy 
rate applies to ergodic maps on finite-dimensional intervals. 

Let I d be a proper interval of R d endowed with the sigma-algebra B\ Id = B(~)I d , 
the restriction of Borel sigma-algebra of W 1 to I d , and let / : I d — > I d be a 
/i-preserving transformation, with /i being a measure on (I d , B\ Id ). In order 
to define the permutation entropy of /, we consider first product partitions 

d 

1 = iiUhk, ■ ■ -,iN k ,k} 

k=l 

of I d into N d := Ni...Nd subintervals of lengths Aj^, 1 < j ' < Nj,, in each 
coordinate k, defining ||i|| = maxj^Aj^. The intervals are lexicographically 
ordered in each dimension, i.e., points in Ij^ are smaller than points in Jj+i,fe 
and for the multiple dimensions a lexicographic order is defined, < Ij,k+i, 
so there is an order relation between all the N d partition elements, and we 
can enumerate them with a single index « e [1, N d ]: 

L = {lf:l<i< N d }, I d < I d +1 

Next define a collection of simple observations S l = (S^) with respect to 
/ with precision ||t||: S^(x) — % if f n (x) £ if, n = 0,1,... Then S L is an 
ergodic stationary iV d -state random process or, equivalently, an ergodic source 
on (I d , B\ Id , n) with finite alphabet A L = {1, ...,N d } and output probability 
distribution m — pL o with 4>(x) = (Sq(x), S[(x), ...) e A lN , so that 

Pr (i ,...,i n -i) 
= Pr ^Sq = iq, . . . , S^-i = i n -i) 

= m{s e A lN : S = «0, — , «n-l = «n-l} 

^(/Jn/- 1 /Jn...nr +1 tJ' ( 8 ) 



11 



In fact, / and the left shift a on the sequences (S^(x)) are conjugate. A 
simple implementation of S l for / = [0, 1[ and iV = 10 fc is the following: 
S l n (x) = [f n (x) ■ 10 k \ + 1 = \f n (x) ■ 10 fc j with Ii = [(i - l)10~ k , il0~ k [ for 
1 < i < N . We see that using simple observations as a finite alphabet 
measurement with respect to / provides a direct link between the entropies of 
S l and /. Accordingly, we define the permutation entropy rate of f as 



*•(/):= lim ZC(S') 



(9) 



provided the limit exists. With this definition, and Theorem 2, we may prove 
the principal result on ergodic dynamical systems. 

Theorem 4 If f : I d — > I d is ergodic, then h^(f) = h^(f). In words, the 
permutation entropy rate of ergodic maps equals the metric entropy rate. 



PROOF. If = oo, the statement follows in general (also for non-ergodic 

maps) from (7). If h^f) < oo, we have (see (8)) 



h m (S l ) 

= - lim - V Pr (i , . . . , i n -i) log Pr (i , . . . , i n -i) 



n 



= - lim -hJ\J f-h) 
= h li (f, i). 



On the other hand, h m (S L ) = h* m (S L ) by Theorem 2 (since S' is ergodic with 
respect to the measure to). 

Let 7 denote the finite generating partition of / that, according to Krieger's 
Theorem [13], must exist (due to /'s ergodicity and finite metric entropy), so 
that h^(f) = h^f, 7) = h m (S 7 ). We claim that 

lim h m (&) = /i m (S 7 ) 

ctnd Iighcg 

h;(f) = hm h*J&) = hm h m (&) = h,(f). 

Ikll— »o ||t||— 

Case 1. Suppose that the elements of 7 are (d-dimensional) intervals or, more 
generally, that all elements of 7 consist of a finite number of intervals. In 
either case, taking if necessary a refinement of 7 (thus, also a generator that 
we call 7 as well) so that 7 becomes a product partition 1 of I d , we deduce 
h m (S 1 ') = /i m (S 7 ) = fyu(/) and the same is true for any further refinement of 
1. 
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Case 2. If, otherwise, some component of 7 consists (modulo 0) of infinitely 
many intervals, we can define a sequence of ever finer partitions {in) n & of I d 
that, after an hypothetical refinement can be assumed without restriction to 
be a product partition {Case 1 ) such that A(i n ), the finite sigma-algebras gen- 
erated by the i n , build an increasing sequence and \Z™ =1 A{i n ) = B\ Id (modO). 
Then h^(f) = lim^oo h^(f, i n ) [13]. 

This proves our claim and the theorem. □ 



6 On the definition of permutation entropy rate for dynamical sys- 
tems 

The original definition of Bandt, Keller and Pompe (BKP) [2] of the permu- 
tation entropy of maps on intervals / Cl involves partitions of the form 

P 7T = {xeI: f<°\x) < f^\x) <...< r^ix)} , 

where n G <7l, here the set of permutations of {0, 1, . . . , L — 1}, L > 2. In fact, 
if / is supposed to be piecewise monotone as in [2] or just ergodic, as in our 
case, it is easy to show that 

VI = {P. ^ : 7T G a L } 

is a partition of / (except maybe for a set of points of measure zero). BKP 
define then the permutation entropy of order L as 

(compare to (6)) and their permutation entropy rate of / to be 

hf KP (f):= hm #; BKP (/,L), (11) 

provided the limit exists. They prove /i^ BKP (/) = h^{f) for piecewise mono- 
tone maps on intervals of R, but in the more general case, ergodic maps it 
seems that only the inequality lim inf^oo H*(f, L) > h^f) — formally similar 
to (7) — can be proved, which we have done in Appendix C for ergodic maps on 
rf-dimensional intervals. Comparing such particular results to the generality of 
Theorem 4, we may conclude that our definition (9) of permutation entropy 
rate offers a substantial advantage. 
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Note that the central distinction, which makes our formulation easier and 
more natural, is that (9) takes the limit of infinite long conditioning (L — > oo) 
first, and the discrete limit (A — > 0) last, similarly to Kolmogorov-Sinai en- 
tropy rate, and as opposed to (11), where an explicit discretization was not 
taken. We conjecture that for non-pathological dynamical systems of the sort 
one might observe in Nature the two formulations are equivalent, but there 
are likely to be some non-trivial technicalities involved in a rigorous analy- 
sis. For example, [11] shows a 1-dimensional map with an infinite number of 
monotonicity intervals, where the topological entropy rate and the permuta- 
tion version of the topological entropy rate (i.e., counting simply the number 
of distinct permutations with non-zero measure, and not weighting them by 
their measure) are unequal: h^f) = Hindoo log \Vl\ ^ h (f). 



7 Numerical examples and Discussion 

As a by-product of our result, the practitioner of time-series analysis will find 
an alternative way to envision or, eventually, numerically estimate the entropy 
rate of real sources. It is worth reminding that the entropy of information 
sources can be measured by a variety of techniques that go beyond counting 
word statistics and comprise different definitions of 'complexities' such as, for 
example, counting the patterns along a digital (or digitalized) data sequence 
[10,14,1]. Bandt and Pompe refer, in [3], to the permutation entropy of time 
series as complexity. That the entropy rate can also be computed by counting 
permutations shows once again that it is a so general concept that can be 
captured with different and seemingly blunt approaches. 

We demonstrate numerical results on time series from the logistic map x n+ \ = 
Ax n (l — x n ). Figure 1 shows an estimate of the permutation entropy rate esti- 
mate on noise-free data as a function of A, comparing the Lyapunov exponent 
(computed from the orbit knowing the equation of motion) to the permutation 
entropy. To be precise, we are estimating ft^(S) with S discretized from the 
logistic map iterated at the discretization of double-precision numerical repre- 
sentation, i.e., S is the output of a standard numerical iteration. The entropy 
estimator of the block ranks was the plug-in estimator (substituting observed 
frequencies for probabilities) plus the classical bias correction, first order in 
1/N. The key unresolved issue in using permutation entropies for empirical 
data analysis is, as with standard Shannon entropy rate estimation, balanc- 
ing the tension between larger word lengths L, to capture more dependencies, 
and the loss of sufficient sampling for good statistics in the ever larger discrete 
space. The finite L performance and convergence rate and bias of any specific 
computational method are key issues when it comes to accurately estimating 
the entropy rate of a source from observed data. It is now appreciated that 
numerically estimating the Shannon block entropy from finite data and, espe- 
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1.5 




A (dimensionless) 



Fig. 1. (color online) Lyapunov exponent (black line, thick) of logistic map and 
permutation entropy rate estimates h = H*{X\ Al ) for N = 10 5 ,10 6 length time 
series from the map (red and black thin lines). The permutation entropy estimate 
tracks changes in the Lyapunov exponent (equal to the Kolmogorov-Sinai entropy 
rate where nonnegative) well, with a nearly constant bias. Periodic orbits give a finite 
permutation entropy, but the rate estimate would tend to zero given a sufficiently 
long word. 

cially, the asymptotic entropy rate, can be surprisingly tricky [12,9,1,7,8]. The 
theoretical definitions of entropy rate do not necessarily lead to good statis- 
tical methods, and superior alternatives have been developed over the many 
years since Shannon. We believe that some of these ideas may similarly be 
applicable to the permutation entropy situation. Figure 2 shows a very simple 
application of the part of the method of [12], fitting an empirical asymptotic 
scaling H*(X±) = tiL =00 + C/L for L = 13,14, comparing to the block es- 
timate. This procedure shows a lower bias, but the specific choice of scaling 
region L (as with block entropy) is a key empirical issue, and does not have a 
generally satisfactory resolution. 

Also important for practical time-series analysis is the usual situation where 
observations of a predominantly deterministic source is contaminated with 
a small level of observational noise. Here, we recommend that the user fix 
some discretization level A characteristic of the noise, and evaluate the per- 
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• block estimator 

• Strong et al estimator 



0.2 



0.4 



h KS < bitS > 



0.6 



0.8 



Fig. 2. (color online) Block entropy estimate (red points) at L = 14 and Strong et 
al [12] fitted estimate (black points) as a function of hxs = = X wherever A > 0. 
The scaling region Ansatz yields lower bias at cost of increased variance. The block 
length and scaling region were chosen by hand, a significant limitation. 



mutation entropies via entropies of rank words evaluated from the discretized 
observables. Figure 3 shows analysis of permutations on significantly noise- 
contaminated signals, with no explicit A (i.e., it is the size of the numerical 
precision of the computations). The consequence is the permutation entropy 
is heavily dominated by the noise. Figure 4 shows the restoration of mono- 
tonic scaling with when an explicit, finite A = 0.2 is used to discretize 
the data before rank variables are computed. Note that as computing ranks 
involves looking at the difference between noise contaminated variables, when 
the characteristic noise size is 0.1, as in this example, an appropriate dis- 
cretization scale is 0.2. 



For vector-valued sources, we applied lexicographic ordering and construc- 
tion of outer product variables in the proof. For analyzing chaotic observed 
data, however, it may be acceptable to still use but one scalar projection, 
subject to the traditional caveats of time-delay embedology. We would expect 
that for appropriately mixing sources and generic observation functions, the 
Kolmogorov- Sinai entropy estimated through that scalar still equals the true 
value, and likewise so might permutation entropy rate. We have found that 
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3.7 3.8 
A (dimensionless) 



Fig. 3. (color online) Lyapunov exponent (black line, thick) of noise-free logistic 
map and permutation entropy rate estimates h = H*(X\ A ) for N = 10 4 ,10 5 ,10 6 
length time series from the map (blue, red and black thin lines), contaminated 
with uniform zero-mean observational noise of width 0.1. Here, the entropy of the 
underlying map is nearly obliterated by the effect of the noise. 

numerically this appears to work in practice. With a direct higher-dimensional 
product space, the undersampling issue becomes even more difficult with in- 
creasing L, hence using scalars, as in a time-delay embedding, may turn out to 
be a superior approach for observed time-series of higher-dimensional sources. 
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for proof-reading some parts of the paper. J.M.A. was partially supported by 
the Spanish Ministry of Education and Science, grant GRUPOS 04/79. M.K. 
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A Ergodic finite-alphabet information sources 



Proof of Lemma 1 Given an ergodic information source S, 



lim H m {RlX\\St) = lim H m (S" k X[\S?, 

k—>oo re— >oo 
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3.5 3.6 3.7 3.8 3.9 

A (dimensionless) 



Fig. 4. (color online) Lyapunov exponent (black line, thick) of noise-free logistic 
map and permutation entropy rate estimates h = H*(X A l ) for N = 10 4 10 5 ,10 6 
length time series from the map (blue, red and black thin lines), contaminated with 
uniform zero-mean observational noise of width 0.1, and discretized to A = 0.2. 
With this discretization the entropy estimate tracks the macroscopic entropy from 
the dynamics much better, though the bias is increased, as expected, since the 
entropy due to noise still has some effect. 

for all I > 1. Consider R k+1 = £i=i 5(S t < S k+1 ). For a 6 {1, ... , N} define 
the sample frequency of the letter a in the word Sf +1 to be 

With the help of t^+i^) we may express R k +i m terms of S^, 1 < i < k + 1, 
namely, 

R k+ i(S k+1 ) = (k + l)jr tf fe+1 (a), 

a=l 

where we assume the outcomes Si +1 to be known. Then, the identity 

N 

Pr (R k+1 = y) = J2Pr (S k+1 = q) 5 (R k+1 (q) = y) (A.l) 

9=1 
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give us the probability for observing some R k +i with value ?/G {l,...,fc + l} 
by means of Pr (S k+i = q), 1 < q < N. Since, given R k +i is a determin- 
istic function of the random variable S k+ i, i.e., Pr(i? fc+1 = y\S k+ i = q) = 
S(R k+1 (q) = y), Eq. (A.l) can be seen as an application of the law of total 
probability. 

Without loss of generality, we may first rearrange the sum in (A.l) to consider 
only those symbol values q with non-zero Pr(S k+ \ = q), summing to N' < N. 
Expand the sum, 

Pr (R k+1 = y) = 

Pr(S k+1 = l)5[y = (k + l)i9 k+1 (l)] 

+ Pr (S k+1 — 2)S[y— (k+ + # fc+ i(2))] 

+ ... + Pr(S k+1 = N') 

x5[y = (k + l)(7? fc+ i(l) + . . . + &k+iW))] ■ 

Suppose all the relevant sample frequencies $fc + i(l), $ k+ i(N') are greater 
than zero. This means that for any y, only a single one of the (^-functions 
can be nonzero, and hence we have a one-to-one transformation taking non- 
zero elements from the distribution Pr(S k+ i) without change into some bin 
for Pr(i?fc + i). Since entropy is invariant to a renaming of the bins, and the 
remaining zero probability bins add nothing to the entropy, we conclude 
that, if i? fe+1 (a) > for all a where the true probability Pr(S' fc+ i = a) > 
(i.e., a — 1, . . . , N' after a hypothetical rearrangement), then H m (R k+1 \Si) = 
H m (S k+ i\Si). Because of the assumed ergodicity, we can make the probability 
that i?fc + i(a) = when Pr(S k+ \ = a) > to be arbitrarily small by taking k 
to be sufficiently large, and the claim follows for 1 = 1. 

This construction can be extended without change to words S%+[ of arbitrary 
length I > 1 via 

Pr (R k k %[ = 2/i ... 2/0 

N' 

= ]T [PrOSjffi = Ql . . . Ql ) x S(R k+1 ( qi ) = Vl ) 

91, ■■-,91=1 

... x 5(R k+t (qi) =yi)}. 

Observe that if i?fc + i(a) > for 1 < a < N', then the same happens with 
# fc+2 (a),...A+,(a) and H(R%+[\S![) = H(S^[\S^ follows. Again, ergodicity 
guarantees that there exist realizations Sf + ' whose sample frequencies fulfill 
the said condition. □ 

As way of illustration, suppose that S n = 0, 1 are independent random vari- 
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ables with probability Pr(S n = 0) = Pr(S n — 1) = \. Given S\ = Si....s k G 

{0, l} k , set N = {si = in S^} , < N < k. Consider the case L = 2 in 
Lemma 1. There are two possibilities: 



(i) < iVo < k. Then 



Sttl = 0,0 RlH = N + l,N + 2 

S k k X 2 1 =l,0^R k k X 2 1 = k + 2,N + l 
S k + 2 1 =l,l^R k X 2 1 = k + l,k + 2 



Each of these events has the joint probability 



Pr(JV„ = „, R$il = O = 21 
and conditional probability 



4 2 fc+2 U, 



„fc+2 



)4 



where < i/ < fc - 1 and r£+ 2 = {v + 1, z/ + 2), {v + 1, fc + 2), (k + 2, v + 1) or 
+ l,fc + 2). 

(ii) iV = k. Then 

s k t 2 = 0, & S*+ 2 = 0, 1 & S*g = 1, 1 

i?£+ 2 = A; + 1, k + 2 
StS = 1,0 ^R k k Xl = k + 2,k + l 

These events have the joint probabilities 



Pr (N = k,R k k Xl = (k + l,k + 2)) 
Pr (iV = k, R k k Xl = (k + 2, k + 1 
and conditional probabilities 



1.1.3 = -?- 

2 k 4 2 k+2 



1 1 



1 



2 k 4 2 k+2 



Pr yR k Xi — (k + l,k + 2)\N = k 
Pr (R k Xl = (k + l,k + 2)\N = k) 



4 
1 
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From (i) and (ii), we get 



H m (Rk+i\Si) 




On the other hand, 

H m (StS\Si) = H m (S£g) = 2 

and so H m (R%+l\Si) and H m (S%Xi\Si) are equal in the limit k — > oo, as 
guaranteed by Lemma 1. 

B Non-ergodic finite-alphabet sources 

In order to deal with the general, non-ergodic case, we appeal to the theo- 
rem on ergodic decompositions [6]: If is a compact metrizable space and 
/ : (Q, J 7 , /i) — > (Q, J 7 , fi) is continuous, then there is a partition of Q into 
/-invariant subsets Q w , each equipped with a sigma-algebra T w and a prob- 
ability measure fi w , such that / acts ergodically on each (£l w ,J- w ,[/, w ), the 
indexing set being another probability space (W, Q, v) (in fact, a Lebesgue 
space). Furthermore, 

fi(E) =11 di2 w du(w) = [ fi w (E)du(w) (E G J 7 ). 
Jw Je Jw 

The family {/j, w : w G W} is called the ergodic decomposition of //. 

If a is the shift on the (compact, metric) sequence space (A N ,Z,m), the in- 
dexing set can be taken to be itself, i.e., 

m {C) =11 dm s dm(s) = f m s (C)dm(s) (C G Z), (B.l) 
Ja n Jc Ja n 

where = m s [5]. This result shows that any source which is not ergodic 

can be represented as a mixture of ergodic subsources. The next lemma states 
that such a decomposition holds also for the entropy rate. 

Lemma 5 (Ergodic Decomposition of the Entropy Rate [5]) Let (A N , Z , m, a) 

be the sequence space model of a stationary finite alphabet source S = (S n ). Let 
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{m s : s G A n } be the ergodic decomposition of m. If h ms (S) is m-integrable, 
then 

h m (S)= ( h ms (S)dm(s). (B.2) 

J A® 

Theorem 6 Under the assumptions of Lemma 5, liminf^oo H^S^) > h m (S) 
holds for any finite alphabet source S. 



PROOF. Fix L > 2. From (6) and (B.l), 



-J^lY. L {j Ali m ^)dm{s) 



x log ^jT^ m s (Cn)dm(s) 

>"t\E (/ m a (C w )logm a (C v )dm(s)) (B.3) 

= I [ - T 1 XI ™s(CV) log m s (CV) J dm(s) 
where in (B.3) we have used Jensen's inequality, 

•GO*)*/*.* '**- 

with $(£) = i logi convex in [0, oo) and /(s) = fi s (Q n ) > 0. 
Therefore, 

lim inf H^S?) 

> Mm inf ^ jf^ )dro(«) (B.4) 

> jT H Qim inf H* ms (S^ dm(s) (B.5) 
= J An h* ms (S)dm(s), (B.6) 

where we have applied Fatou's lemma in (B.5) to the sequence of positive and 

m-measurable functions H^ n {S^). Observe that h* m (S) exists for all s G A n 

(and is m-integrable as a function of s) since /i^ s (S) = /i ms (S) by Theorem 1 
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(S acts ergodically on (A®, Z s ,m s )). Therefore, 

lim inf H* m (St) > I h ms (S)dm(s) = h m (S) 

L— >oo Ja n 

by(B.2). □ 

Theorem 6 and Eqs. (B.4) and (B.6) yield: 

Corollary 7 Ifh^(S) = lim^oo H^S^ ) exists for a non-ergodic finite- alphabet 
source S, then h^S) > h m (S) and /i^(S) > J AN h^ rig (S)dm(s) . 



C Interval maps 

Suppose first that 7 is a one-dimensional interval and / : 7 — > 7 an ergodic 
and /z-preserving transformation, where [i is a measure on (7, B fl 7), B being 
Borel sigma-algebra of R. 

Lemma 8 If f : I — > 7 is ergodic and h^f) < oo, then liminf^oo 77*(/, L) > 
h^f). See (10) for the definition ofH*(f,L). It follows, hf KP (f) > h„(f). 

PROOF. Let 7 be a finite generator of / (Krieger's Theorem, [13]). We split 
the proof in two parts. In the first part we follow the approach of [2, Sect. 3]. 

Case 1. Suppose that the elements of 7 are connected sets (intervals) or, more 
generally, that all elements of 7 consist of a finite number of intervals. In either 
case, taking if necessary a refinement of 7 (thus, also a generator) that we call 
7 as well, we write without restriction 7 = {Ij, 1 < j < were Ij C 7 
are intervals. This being the case, let c\ < C2 < ... < ci 7 i_i be the points that 
subdivide the interval 7 = [a,b] into the I7I intervals Ij of the generator 7. We 
consider a fixed P w G VI and show that it can intersect at most [L + I)! 7 ' -1 
sets of the partition 79 _1 := f~ % {Ij^) with I jo , ...,7 ?i _ 1 G 7. For x G P n , 
let A L [x] denote the set in 70 _1 that contains x. Thus, A L [x] can be written 
as I jo H /" _1 (7 ?1 ) (7 ... n witn ^?o> —i^jl-i e 7> so tliat ^ can be 

specified by the n-tuple j[x\ = (j , -Jl-i) e {1, |7|} L - 

Now, 7r is given by inequalities X/ C1 < ... < x kL with {k±, A;l} = {0, L — 1} 
and 0;^ = f k (x). For each x G we can extend these inequalities so that they 
give the common order of the c r and the x^, , where 1 < r < I7I — 1 and 1 < I < 
L. It follows that there are at most (L + 1)ItI _1 possible extended orders since 
each c r has L + 1 possible bins to go among the x^ (as x varies in P w , the L 
points Xk t defining the bins move but do not cross each other). Moreover, when 
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we know the common order of the c r and x kl , then j [x] is uniquely determined 
(since Cj_i < Xk < Cj, implies Xk G Ij and thus x G f~ k (Ij), with 1 < j < |7|, 
c = a and q 7 | = 6). 

Each P w e V* L is then the union of at most (L + l)' 7 ' -1 sets 14 G 7 o _1 V P£ 
with total measure fi(P n ). Hence, 



(L+l)l7|-i 
k=l 

^ (L + l)! 7 !- 1 S (.L + 1)10-1-1 

= —fl(P w ) \0gfl(P w ) + (| 7 | - l)^(Pn) \0g(L + 1) 

and therefore, summing over all it G <7l, 

^(7o _1 ) < ^(7o _1 V VI) < H,{Vl) + (| 7 | - 1) log(L + 1). (C.l) 
It follows 

> 7^7 [^(-rf -1 ) - (M - 1) iog(^ + 1) 

and 

lim jrf > ft (/) (C2) 

since 7 is a generator of /. Definition (10) completes the proof in this case. 

Case 2. If some component of 7 consists of infinitely many intervals, we can 
define a sequence of interval partitions (7ri)„ e N {Case 1) such that «4( 7n ), the 
finite sigma- algebras generated by the 7n , build an increasing sequence and 
V~ lt A( 7n ) = B (modO). Then h^f) = lim^ h^(f, 7n ) [13]. 

We claim that, also in this case, Eq. (C.2) holds. Otherwise, for every e > 
and for every L > 2, there exists V > L such that 

j^H.iVl,) < h„(f) - e. (C.3) 

Take now n such that \h^(f) — h^f, 7n )| < £ for all n > n . From (C.3) it 
follows 

because -^iJ((7 no )o _1 ) decreases monotonically to 7no )- Use now (C.l) 
to deduce 
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<M/>7no) 




|7tJ ~ 1 



log(L' + 1). 



U - 1 



But the last term can be made arbitrarily small because the V fulfilling (C.3) 
form an unbounded subsequence and n is independent of L' . This contradic- 
tion proves our claim and completes the proof. □ 

More generally, let I d be now a proper, lexicographical ordered interval of M. d . 

Theorem 9 Let f be an ergodic interval map in M. d fulfilling the above as- 
sumptions. If hfj,(f) < oo, then lim inf/^.^ H*(f,L) > h^(f), where the per- 
mutation entropy is defined by means of the product order ofM. d . 

Proof outline As in Lemma 8, we split again its proof in two cases. If (Case 
1 ) the generating partition is a product partition or can be refined to a product 
partition 



(whose elements are, without restriction, lexicographically ordered), then the 
same approach used for one-dimensional intervals works through to Eq. (C.2). 
Otherwise (Case 2), each element of 7 is the countable union of disjoint in- 
tervals. They allow to define (after an eventual refinement) a sequence of 
product partitions (7„) ne N (Case 1) such that h^(f ) = lim^oo h^f, j n ). The 
proof that liminfi^oo H*(f,L) > h^(f') is then completed again by contra- 
diction. □ 
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